-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Single loop for FieldfInfo processing #137967
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Single loop for FieldfInfo processing #137967
Conversation
No codec should extend DeduplicatingFieldInfoCodec, instead always wrap it.
Previously both DeduplicatingFieldInfosFormat and TSDBSyntheticIdCodec.RewriteFieldInfosFormat would iterate over FieldInfos. This is now optimised by letting TSDBSyntheticIdCodec.RewriteFieldInfosFormat extend DeduplicatingFieldInfosFormat in order to let RewriteFieldInfosFormat utilise the iteration done by DeduplicatingFieldInfosFormat. Also let TSDBSyntheticIdCodec extend DeduplicateFieldInfosCodec so that we can simplify the codec wrapping to always use only one of them.
|
Pinging @elastic/es-storage-engine (Team:StorageEngine) |
|
Hi @burqen, I've created a changelog YAML for you. |
tlrx
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
| return codec; | ||
| String name = e.getValue().getName(); | ||
| Codec codec = e.getValue(); | ||
| return useTsdbSyntheticId ? new TSDBSyntheticIdCodec(name, codec) : new DeduplicateFieldInfosCodec(name, codec); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
| return useTsdbSyntheticId ? new TSDBSyntheticIdCodec(name, codec) : new DeduplicateFieldInfosCodec(name, codec); | |
| return useTsdbSyntheticId ? new TSDBSyntheticIdCodec(codec) : new DeduplicateFieldInfosCodec(codec); |
| protected DeduplicateFieldInfosCodec(String name, Codec delegate) { | ||
| super(name, delegate); | ||
| this.fieldInfosFormat = createFieldInfosFormat(delegate.fieldInfosFormat()); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| protected DeduplicateFieldInfosCodec(String name, Codec delegate) { | |
| super(name, delegate); | |
| this.fieldInfosFormat = createFieldInfosFormat(delegate.fieldInfosFormat()); | |
| } | |
| protected DeduplicateFieldInfosCodec(Codec delegate) { | |
| super(delegate.getName(), delegate); | |
| this.fieldInfosFormat = createFieldInfosFormat(delegate.fieldInfosFormat()); | |
| } |
|
|
||
| protected void validateFieldInfos(FieldInfos fieldInfos) {} | ||
|
|
||
| protected FieldInfo processFieldInfo(FieldInfo fi) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
| protected FieldInfo processFieldInfo(FieldInfo fi) { | |
| protected FieldInfo wrapFieldInfo(FieldInfo fi) { |
|
|
||
| private static Codec unwrappedCodec(CodecService codecService, String codecName) { | ||
| Codec codec = codecService.codec(codecName); | ||
| if (codec instanceof DeduplicateFieldInfosCodec deduplicatingCodec) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can be:
| if (codec instanceof DeduplicateFieldInfosCodec deduplicatingCodec) { | |
| if (codec instanceof FilterCoded filtered) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately not, the delegate is hidden from us there.
fcofdez
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm afraid that we should revert the change around not extending DeduplicateFieldInfosCodec in the default codecs. The reason is that Lucene would use SPI to load the Codec and it will just instantiate the codec with the no-args constructor and thus we won't get to deduplicate the fields. This only applies when a node is restarted for example and we need to read the codec from the SegmentInfo and I guess that it applies to search nodes in serverless too.
I can see that Lucene is used to load the codecs here, but they will all be wrapped afterwards. Is there some other place as well where the codecs are service loaded outside of |
Yes, in Lucene, when a commit is read from disk (see https://github.com/apache/lucene/blob/e02bdb4c3c547488342b423e1b9b2b25519bd427/lucene/core/src/java/org/apache/lucene/index/SegmentInfos.java#L409-L412). We kind of rely implicitly that for reading a segment with a particular codec we can use the SPI loaded codec. |
Got it! Thanks for pointing that out. I'll revert and find some other approach. |
|
Good observation @fcofdez, this indeed makes wrapping approach not effective. But it should still be possible to loop once over field info in |
|
@martijnvg The problem is that the I tried a more brute force approach
This makes me think it's a bad idea to make deduplicating codec aware of of synthetic id all together. But I cannot figure out a way to get rid of the extra loop in any other way. All my ideas center around in one way or the other inject functionality into This makes me come back to your comment @fcofdez . How important is it that the service loaded codec benefit from the deduplication? They will still be loaded, only they will not be wrapped with the deduplication. How does that weigh against this optimization that we are trying to do here? Do you have any ideas? @fcofdez , @tlrx , @martijnvg ? About PostingsFormat: |
Previously both DeduplicatingFieldInfosFormat and TSDBSyntheticIdCodec.RewriteFieldInfosFormat would iterate over FieldInfos.
This is now optimised by letting TSDBSyntheticIdCodec.RewriteFieldInfosFormat extend DeduplicatingFieldInfosFormat in order to let RewriteFieldInfosFormat utilise the iteration done by DeduplicatingFieldInfosFormat.
Also let TSDBSyntheticIdCodec extend DeduplicateFieldInfosCodec so that we can simplify the codec wrapping to always use only one of them.
Additional changes: